The adaptation of new technologies in the banking industry is continuous and growing drastically to replace the counter transactions in the banking. On the other hand, fraudulent transactions are hindering reputation and profitability aspects of this industry. In order to prevent this deter, the real time genetic and analytical tool is required for the same. This laid the corner stone for this research work, which is built with suitable algorithm to analyze each customer by their pattern on transactions to avoid money laundering in the bank account. The real challenging task under this research work is to classify and cluster all the transactions and customer base which is exceptionally very large database. The taste of the prevention is fully depends on the above say as the filtration of necessary data increase the accuracy of this research work. The Decision Tree Classification algorithm is constructed as a basement for this research work. Each of the balanced decision trees are enabled with weighed average which identifies the risk factor and cluster index. This research work is loaded with total of thirty indicative bullets under the decision tree and further clustered with five groups. Based on the outcome of this decision tree and its loaded weight, Data Cube outlier analysis shall find the relevance of the same which shall cause money laundering in near future.
Introduction
1. Overview
The research focuses on detecting money laundering activities by analyzing large-scale banking databases containing customer, transaction, and risk data.
The approach begins with data cleaning and identification of indicative alert indicators that may signify suspicious or fraudulent activities.
2. Indicative Alert Indicators
The study lists 30 alert indicators across four main categories:
Customer Base Alerts (D1–D7):
Include issues such as failed onboarding, fake or unverifiable KYC documents, wrong addresses, untraceable beneficiaries, and customers with criminal backgrounds.
Risk Base Alerts (D8–D10):
Identify links to terrorism, criminal cases, or negative media reports.
Transaction & Risk Base Alerts (D11–D27):
Cover suspicious transaction behaviors like tense or vigilant customers, inconsistent information, multiple identities, unknown third parties, unusual fund sources, irrational transaction patterns, unauthorized foreign remittances, and declining cross-border payments.
Agency Alerts (D28–D30):
Include grievances or alerts triggered by agents or other financial institutions.
These indicators act as input features for automated decision-making and serve as the foundation for machine learning (decision tree) operations to detect potential laundering patterns.
3. Decision Tree Model
The decision tree algorithm is central to the detection process.
Each node represents a decision rule derived from an alert indicator.
Weights are assigned to nodes:
+1 for customer data deviations
+2 for transaction-related deviations
As the tree traverses through branches, weights accumulate with each matched rule.
A higher cumulative weight indicates a greater probability of money laundering.
Even unmatched data continues traversing the tree to find other possible deviations, ensuring no anomaly is missed.
The algorithm thus helps classify transactions or customers into normal or suspicious groups, depending on their deviation patterns and cumulative weight.
4. Hyperplane and Mathematical Modeling
Each data point (“a”) is represented as a hyperplane hy(a) in multidimensional space using:
A weight vector (W) representing attributes
An offset (b) from the origin
These hyperplanes help define decision boundaries in the decision tree, aiding in visualization and separation of normal versus suspicious patterns.
5. Data Cube Outlier Analysis
To handle high-dimensional data, results from the decision tree are visualized through a Data Cube model:
Each cube dimension represents a specific attribute or partition (e.g., customer, transaction, risk).
Outliers (i.e., abnormal data points or customers) appear as distinct cube surfaces or clusters, making it easier to spot deviations visually.
The cube allows for multi-dimensional analysis, linking anomalies across various data attributes.
The algorithm for the data cube:
Reads each partition and calculates weights.
Flags records with significant deviations.
Sorts and maps weighted results into cube surfaces.
Highlights potential laundering cases visually.
6. Algorithm Design
Two core algorithms were implemented:
(a) Decision Tree Construction
Builds tree nodes and leaves based on data deviations.
Assigns and increments weights.
Flags transactions as “Money Laundering” when thresholds are met.
(b) Data Cube Outlier Analysis
Converts tree results into multidimensional cubes.
Sorts deviations by weight and significance.
Visualizes suspicious clusters for easier detection.
Conclusion
The above algorithm is matching all the attributes based on the weight and the flag value to assign relevant surface under the data cube. The weightage of the risk value is the significant and constructed under the data cube algorithm. Further based on this, the priority is fixed as high or medium or low level during the analysis of data cube. The user shall view the data cube surface based on the weight or cluster or key identifiers. The data cube shall also to be migrated based on the above values for the user defined sets.
References
[1] R. Agrawal and R.Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 1994 Int’lconf. Very Large Data Bases, pp. 487-499, Santiago,Chile,Sept. 2010.
[2] Haruka Fuse, Haruka Fukamachi, Mitsuko Inoue and Takeshi Igarshi, “Identification and Functional Analysis of the Gene Cluster”,Gene Volume 515,Issue 2, Pages 291-297,25th February 2013, Elsevier Publications
[3] D.W. Cheung, J. Han, V. Ng,A. Fu, and Y. Fu, “A Fast Distributed Algorithm for Mining Association Rules,” Proc. 1996 Int’l Conf. Parallel and Distributed Information Systems, PP. 1996 Int’l Conf. Data Enf., PP. 106-114, New Orleans, Feb. 2010.doi 10.1109/PDIS.1996.568665
[4] L. Li, C.R. Weinberg, T.A. Darden, L.G. Pedersen, “Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method”, Bioinformatics 17 (12) (2001) 1131–1142.doi.10.1007/978-3-642-13089-2_49
[5] J. Khan, J.S. Wei, M. Ringner, L.H. Saal, M. Ladanyi, F. Westermann, F. Berthold, M. Schwab, C.R. Antonescu, C. Peterson, P.S. Meltzer, “Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks”, Nat. Med. 7 (6) (2001) 673–679.
[6] M.B. Eisen, P.T. Spellman, P.O. Brown, D. Bostein, “Cluster analysis and display of genome-wide expression patterns”, Proceedings of the National Academy of Science USA 95 (1998) 14,863–14,868.
[7] T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. GaasenBeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri,C.D. Blomfield, E.S. Lander, “Molecular classification of cancer: class discovery and class prediction by gene-expression monitoring”, Science 286 (1999) 531–537.doi.10.1126/science.286.5439.531
[8] E. Frank, I.H. Witten, “Generating accurate rule sets without global optimization”, in: , Machine Learning: Proceedings of the 15th International Conference, Morgan Kaufmann Publishers, Los Altos,CA, 1998
[9] Y. Fu and J. Han, V. Ng, A. Fu, and Y. Fu, “A Fast Distributed Algorithm for Mining Association Rules,” Proc. 1996 Int’l Conf. Parallel and Distributed Information Systems, PP. 31-44, Miami Beach, Fla.,Dec. 2001.
[10] D.W. Cheung, J. Han,V. Ng, and C.Y. Wong, “Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Technique,” Proc. 1996 Int’l Conf, Data Engg., PP. 106-114, New Orleans, Feb. 2009.doi.10.1109/ICDE.1996.492094
[11] D.W. Cheung, J. Han, V. Ng,A. Fu, and Y. Fu, “A Fast Distributed Algorithm for Mining Association Rules,” Proc. 1996 Int’l Conf. Parallel and Distributed Information Systems, PP. 1996 Int’l Conf. Data Engg., PP. 106-114, New Orleans, Feb. 2010.doi.10.1109/ PDIS.1996.568665
[12] M.S. Chen, J. Han, and P.S. Yu, “Data Mining: An overview from a Database Perspective,” IEEETrans. Knowledge and Data Engg., Vol.8, PP.866-883,1996
[13] R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules Between Sets of Items in Large Databases,” Proc. 1993 ACM SIGMOD Int’l Conf. Management of Data, pp. 207-216, Wahington, D.C., May 1993.doi.10.1145/170036.170072